37 research outputs found

    Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search

    Full text link
    Text-based person search aims to retrieve the corresponding person images in an image database by virtue of a describing sentence about the person, which poses great potential for various applications such as video surveillance. Extracting visual contents corresponding to the human description is the key to this cross-modal matching problem. Moreover, correlated images and descriptions involve different granularities of semantic relevance, which is usually ignored in previous methods. To exploit the multilevel corresponding visual contents, we propose a pose-guided multi-granularity attention network (PMA). Firstly, we propose a coarse alignment network (CA) to select the related image regions to the global description by a similarity-based attention. To further capture the phrase-related visual body part, a fine-grained alignment network (FA) is proposed, which employs pose information to learn latent semantic alignment between visual body part and textual noun phrase. To verify the effectiveness of our model, we perform extensive experiments on the CUHK Person Description Dataset (CUHK-PEDES) which is currently the only available dataset for text-based person search. Experimental results show that our approach outperforms the state-of-the-art methods by 15 \% in terms of the top-1 metric.Comment: published in AAAI2020(oral

    FreeU: Free Lunch in Diffusion U-Net

    Full text link
    In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the U-Net architecture to the denoising process and identify that its main backbone primarily contributes to denoising, whereas its skip connections mainly introduce high-frequency features into the decoder module, causing the network to overlook the backbone semantics. Capitalizing on this discovery, we propose a simple yet effective method-termed "FreeU" - that enhances generation quality without additional training or finetuning. Our key insight is to strategically re-weight the contributions sourced from the U-Net's skip connections and backbone feature maps, to leverage the strengths of both components of the U-Net architecture. Promising results on image and video generation tasks demonstrate that our FreeU can be readily integrated to existing diffusion models, e.g., Stable Diffusion, DreamBooth, ModelScope, Rerender and ReVersion, to improve the generation quality with only a few lines of code. All you need is to adjust two scaling factors during inference. Project page: https://chenyangsi.top/FreeU/.Comment: Project page: https://chenyangsi.top/FreeU

    MetaFormer Is Actually What You Need for Vision

    Full text link
    Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to their competence. However, recent works show the attention-based module in Transformers can be replaced by spatial MLPs and the resulted models still perform quite well. Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance. To verify this, we deliberately replace the attention module in Transformers with an embarrassingly simple spatial pooling operator to conduct only basic token mixing. Surprisingly, we observe that the derived model, termed as PoolFormer, achieves competitive performance on multiple computer vision tasks. For example, on ImageNet-1K, PoolFormer achieves 82.1% top-1 accuracy, surpassing well-tuned Vision Transformer/MLP-like baselines DeiT-B/ResMLP-B24 by 0.3%/1.1% accuracy with 35%/52% fewer parameters and 50%/62% fewer MACs. The effectiveness of PoolFormer verifies our hypothesis and urges us to initiate the concept of "MetaFormer", a general architecture abstracted from Transformers without specifying the token mixer. Based on the extensive experiments, we argue that MetaFormer is the key player in achieving superior results for recent Transformer and MLP-like models on vision tasks. This work calls for more future research dedicated to improving MetaFormer instead of focusing on the token mixer modules. Additionally, our proposed PoolFormer could serve as a starting baseline for future MetaFormer architecture design. Code is available at https://github.com/sail-sg/poolformer.Comment: CVPR 2022 (Oral). Code: https://github.com/sail-sg/poolforme

    Tailoring surface hydrophilicity of porous electrospun nanofibers to enhance capillary and push-pull effects for moisture wicking

    Get PDF
    In this article, liquid moisture transport behaviors of dual-layer electrospun nanofibrous mats are reported for the first time. The dual-layer mats consist of a thick layer of hydrophilic polyacrylonitrile (PAN) nanofibers with a thin layer of hydrophobic polystyrene (PS) nanofibers with and without interpenetrating nanopores, respectively. The mats are coated with polydopamine (PDOPA) to different extents to tailor the water wettability of the PS layer. It is found that with a large quantity of nanochannels, the porous PS nanofibers exhibit a stronger capillary effect than the solid PS nanofibers. The capillary motion in the porous PS nanofibers can be further enhanced by slight surface modification with PDOPA while retaining the large hydrophobicity difference between the two layers, inducing a strong push–pull effect to transport water from the PS to the PAN layer

    Gene Delivery to Nonhuman Primate Preimplantation Embryos Using Recombinant Adeno-Associated Virus

    Get PDF
    Delivery of genome editing tools to mammalian zygotes has revolutionized animal modeling. However, the mechanical delivery method to introduce genes and proteins to zygotes remains a challenge for some animal species that are important in biomedical research. Here, an approach to achieve gene delivery and genome editing in nonhuman primate embryos is presented by infecting zygotes with recombinant adeno-associated viruses (rAAVs). Together with previous reports from the authors of this paper and others, this approach is potentially applicable to a broad range of mammals. In addition to genome editing and animal modeling, this rAAV-based method can facilitate gene function studies in early-stage embryos
    corecore